Exploration of a collection of documents in neuroscience and extraction of topics by clustering

نویسندگان

  • Antoine Naud
  • Shiro Usui
چکیده

This paper presents a preliminary analysis of the neuroscience knowledge domain, and an application of cluster analysis to identify topics in neuroscience. A collection of posters presented at the Society for Neuroscience (SfN) Annual Meeting in 2006 is first explored by viewing existing topics and poster sessions using multidimensional scaling. Based on the Vector Space Model, several Term Spaces were built on the basis of a set of terms extracted from the posters' abstracts and titles, and a set of free keywords assigned to the posters by their authors. The ensuing Term Spaces were compared from the point of view of retrieving the genuine category titles. Topics were extracted from the abstracts of posters by clustering the documents using a bisecting k-means algorithm and selecting the most salient terms for each cluster by ranking. The terms extracted as topic descriptors were evaluated by comparing them to existing titles assigned to thematic categories defined by human experts in neuroscience. A comparison of two approaches for terms ranking (Document Frequency and Log-Entropy) resulted in better performance of the Log-Entropy scores, allowing to retrieve 31.0% of original title terms in clustered documents (and 37.1% in original thematic categories).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

تحلیل موضوعی مقالات مرتبط با اعتیاد در پایگاه مدلاین به روش خوشه بندی سلسله مراتبی: 2014-1991

Introduction: Addiction, which has recently attracted the attention of researchers, is a serious problem worldwide. The growth of relevant literature contributes to a better understanding of this problem and improves the interaction between executive organizations and academic institutions. It is important to identify the active subject areas within this field and to explore the topics which ar...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

A clustering approach for mineral potential mapping: A deposit-scale porphyry copper exploration targeting

This work describes a knowledge-guided clustering approach for mineral potential mapping (MPM), by which the optimum number of clusters is derived form a knowledge-driven methodology through a concentration-area (C-A) multifractal analysis. To implement the proposed approach, a case study at the North Narbaghi region in the Saveh, Markazi province of Iran, was investigated to discover porphyry ...

متن کامل

Exploration of a Text Collection and Identification of Topics by Clustering

Abstract. An application of cluster analysis to identify topics in a collection of posters abstracts from the Society for Neuroscience (SfN) Annual Meeting in 2006 is presented. The topics were identified by selecting from the abstracts belonging to each cluster the terms with the highest scores using different ranking schemes. The ranking scheme based on logentropy showed better performance in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Neural networks : the official journal of the International Neural Network Society

دوره 21 8  شماره 

صفحات  -

تاریخ انتشار 2008